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ABSTRACT 


Over the past decade, machine learning has become an in- 
tegral part of educational technologies. With more and 
more applications such as students’ performance prediction, 
course recommendation, dropout prediction and knowledge 
tracing relying upon machine learning models, there is in- 
creasing evidence and concerns about bias and unfairness of 
these models. Unfair models can lead to inequitable out- 
comes for some groups of students and negatively impact 
their learning. We show by real-world examples that educa- 
tional data has embedded bias that leads to biased student 
modeling, which urges the development of fairness formaliza- 
tions and fair algorithms for educational applications. Sev- 
eral formalizations of fairness have been proposed that can 
be classified into two types: (i) group fairness and (ii) indi- 
vidual fairness. Group fairness guarantees that groups are 
treated fairly as a whole, which might not be fair to some 
individuals. Thus individual fairness has been proposed to 
make sure fairness is achieved on individual level. In this 
work, we focus on developing an individually fair model for 
identifying students at-risk of underperforming. We propose 
a model which is based on the idea that the prediction for 
a student (identifying at-risk students) should not be influ- 
enced by his/her sensitive attributes. The proposed model 
is shown to effectively remove bias from these predictions 
and hence, making them useful in aiding all students. 


Keywords 
Fairness, at-risk students detection, decision making, stu- 
dent modeling 


1. INTRODUCTION 


Educational data mining (EDM) approaches seek to analyze 
student-related data with the objective of improving learn- 
ing outcomes for students. Many machine learning methods 
have been proposed for student modeling and forecasting. 
However, in the past few years, concerns have emerged about 
the fairness of machine learning models. An investigation by 
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ProPublica has found that a machine learning tool COM- 
PAS used to predict risk of recidivism exhibits alarming 
bias against African-American defendants. It shows that the 
false positive rate of African-American defendants is twice 
as their white counterparts (45% vs. 23%) [1]. Buolamwini 
et al. [3] observed imbalanced gender and skin type distri- 
butions in facial recognition datasets. Their study shows 
that facial recognition algorithms are more likely to misclas- 
sify darker-skinned females with error rates up to 34.7%, 
while the maximum error rate for light-skinned males is 
0.8%. In health care, an algorithm used to guide health de- 
cisions found that African-American patients assigned the 
same level of risk are sicker than white patients [24]. 


In the domain of EDM, unfairness has also been observed. In 
academic performance prediction systems, social indicators 
have been found to predict low-performance of male students 
more accurately than that of female students [29]. A study 
by Doroudi et al. [7] showed that although personalized 
models were more equitable than treating all students the 
same, they were still not fair when relying on inaccurate 
models and the inequities could cascade as the amount of 
content increases. 
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Figure 1: GPA distribution by gender. 
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Machine learning models learn from data. If bias is recorded 
in data, models trained on the biased data can also be bi- 
ased [3]. Bias is also observed in educational data. Figures 1 
and 2 show the average GPA of students by gender and race 
at George Mason University over a period of ten years. The 
GPA of a student is his/her accumulative GPA as of the last 
term before graduation. In Figure 1, average GPA of male 
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Figure 2: GPA distribution by race. 
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students is skewed towards lower GPAs, while average GPA 
of female students is skewed towards higher GPAs. The av- 
erage GPA of overall female students is 3.15 which is higher 
than that of male students 2.86. Figure 2 shows the average 
GPA of African-American and non-African-American stu- 
dents. From the figure, we can observe that average GPA 
of African-American students leans towards left while that 
of non-African-American students leans towards right. The 
data shows that the average GPA of African-American stu- 
dents is 2.86, while it is 3.03 for non-African-American stu- 
dents. 


Biased data can lead to biased machine learning models 
which can be harmful to minority groups. For example, 
models predicting a group of students to be at-risk or under- 
performing can discourage them and undermine their learn- 
ing outcomes. To resolve the harmful results brought about 
by inequity of machine learning, there are critical needs to 
develop fair machine learning algorithms. 


In this work, we build a fair machine learning model based on 
metric free individual fairness. Metric free individual fair- 
ness assumes that an individual’s qualification should not 
be changed if his/her sensitive attribute is changed [19]. In 
this paper, without loss of generality we assume there are 
two sensitive attributes. The proposed model is composed 
of two classifiers. Each classifier corresponds to a sensitive 
group. The classifier corresponding to the individual’s sen- 
sitive attribute predicts the individual’s probability of being 
positive, while the probability of the other classifier indi- 
cates the individual’s probability of being positive if his/her 
sensitive attribute is changed. According to the definition of 
metric free individual fairness, the two probability distribu- 
tions should be nearly identical. The proximity of the two 
probability distributions is treated as fairness. The closer 
the two distributions, the fairer the prediction is. In ad- 
dition to fairness, we also care about the accuracy of the 
classifier. Therefore, the overall objective we seek to opti- 
mize is the accuracy of the classifier corresponding to the 
individual and the proximity of the distributions of the two 
classifiers. 


The proposed model is evaluated on datasets collected from 
George Mason University and the task is detecting at-risk 
students. The experimental results show the efficacy of the 


proposed model at mitigating bias. Although, the overall 
data shows that female and non-African-American students 
have higher overall performance, we observe that the bias is 
different for different courses. Specifically, in some courses 
female students belong to disadvantaged group, while in 
other courses male students are in disadvantaged group. 
This observation is useful for future work on developing fair 
machine learning models in educational setting. 


The rest of the paper is organized as following. Section 2 
discusses related work on EDM and fairness. The following 
section introduce preliminary on the definition of individual 
fairness. In Section 4, we propose our fair model for at- 
risk students detection. Datasets and experimental protocol 
is described in Section 5. Section 6 presents experimental 
results and analysis. The last section concludes the paper 
and discusses future work. 


2. RELATED WORK 

In this work, we focus on mitigating bias in classification 
tasks. We first describe related works in EDM that rely 
on classification. Then we describe the formalizations of 
fairness. Lastly, we talk about proposed methods for fair 
machine learning. 


2.1 Classification Problems in EDM 


In educational data mining, there are many tasks that can 
be formulated as a classification problem and several prior 
works have been proposed in this area such as affect detec- 
tion [30], dropout prediction [4], graduation prediction [20], 
at-risk student detection [17, 28], knowledge tracing [31], 
etc. 


Affect detection is the task of classifying a student’s affec- 
tive states such as boredom, confusion, delight, concentra- 
tion and frustration by using sensor [26] and sensor-free 
[2] data. Vinayak et al. [15] proposed to predict student 
dropout using a Naive-Bayes classifier. Ojha et al. [25] pro- 
posed SVMs, Gaussian Processes and Deep Boltzmann Ma- 
chines for student’s graduation prediction using factors such 
as pre-university preparation. A set of human-interpretable 
features have been engineered by Polyzou et al. [28] for at- 
risk student detection. All these tasks can be formulated as 
a classification problem. However, all these works did not 
consider the potential bias and discrimination of the mod- 
els. In this work, we try to build a general method that can 
be used for different kinds of tasks. To test the proposed 
method, we focus on the task of identifying at-risk students. 


2.2 Fairness Formalizations 

Over the years, different formalizations of fairness have been 
proposed that focus on different aspects. For example, sta- 
tistical parity [11] requires that the probability of being pre- 
dicted as positive across all the groups should be nearly the 
same. Equal odds imposes the constraint that the true pos- 
itive rate should be the same for all the groups [14]. Equal 
opportunity requires a qualified individual should be pre- 
dicted as qualified regardless of his/her sensitive attribute 
[14]. Another type of fairness formalization focuses more 
on individual level. The notion of individual fairness pro- 
posed by Cynthia et al. [8] assumes that similar individuals 
should be treated similarly. However, the requirement of a 
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problem-specific similarity metric limits its adoption [5]. Hu 
et al. [19] proposed metric free individual fairness based on 
the assumption that the prediction outcome of an individ- 
ual should be not be influenced by the individual’s sensitive 
attribute. The elimination of similarity metric makes imple- 
mentation of metric free individual fairness easier. 


2.3. Fair Machine Learning Algorithms 

Several algorithms have been proposed to achieve individual 
fairness. Based on John Rawls’ notion of fair equality of op- 
portunity, Joseph et al. [21] proposed an individual fairness 
notion that a worse individual should never be favored over 
a better one. The unfairness comes from the prediction’s de- 
pendence on sensitive attribute. To remove the dependence, 
Zemel et al. [32] proposed learning a fair representation 
which does not contain sensitive information. The represen- 
tation is a cluster of embedding vectors. Following the idea 
of learning fair representation, Edwards [9] proposed to re- 
move sensitive information from the learned representation 
by using adversarial learning. The input feature vectors are 
mapped to an embedding vector by an encoder. An adver- 
sary tries to predict the sensitive attribute from the repre- 
sentation. The encoder and the adversary plays a minimax 
game to remove sensitive information. The fair representa- 
tion learning algorithms achieve individual fairness by first 
learning a representation and then training a classifier based 
on the learned representation. Our proposed model directly 
puts fairness constraints on the predictions. 


3. PRELIMINARIES 


In this section, we discuss the formalization of individual 
fairness. 


3.1 Individual Fairness 

Cynthia et al. [8] introduces the concept of individual fair- 
ness, which is based on the idea that similar individuals 
should be treated similarly. This definition requires a simi- 
larity metric measuring the similarity between two individ- 
uals. Given two individuals x; and xj, a classifier H is indi- 
vidually fair if the difference of the predictions between the 
individuals are upper bounded by their dissimilarity. The 
definition is as following 


D(H (ai), H(x;)) < d(ai, xj) (1) 


where D is the distance measure between the outputs of the 
classifier and d is the distance metric between the two indi- 
viduals. The drawback of this definition is that a similarity 
metric is required. A similarity metric guaranteeing fairness 
is problem specific and requires strong assumptions, which 
obstructs its adoption [5]. 


3.2 Metric Free Individual Fairness 

Hu et al. [19] proposed metric free individual fairness based 
on the idea that the qualification of an individual should not 
be influenced by his/her sensitive attribute. Thus, changing 
an individual’s sensitive attribute should not change the pre- 
diction of a classifier. The definition of metric free individual 
fairness is following 


D(P(Y|zi, S = si), P(Y lai, S # si)) < (2) 


where s; is the sensitive attribute of individual i, D is the 
distance measure of the predictions, € is an arbitrarily small 


Objective 0 


Objective 1 


Figure 3: The architecture of the proposed model. The 
model consists of two classifiers Co and C; corresponding to 
sensitive attribute 0 and 1. An input vector x; is fed into the 
two classifiers and the outputs are used to compute accuracy 
and fairness score. Note that if the sensitive attribute s; is 
0, accuracy Ao and fairness F’ are combined to compute 
objective Oo and only classifier Co is updated; otherwise, 
A, and fairness F are combined to form objective O; and 
classifier C is updated. 


positive number. This definition eliminates the requirement 
of a similarity measure between individuals. In this work, 
we develop a fair model based on this definition. 


4. METHODS 
4.1 Problem Statement 


In this work, we focus on the task of identifying at-risk stu- 
dents. Given a student i with ((x:, 81), ys), Vi € R? encodes 
the student’s grades in courses taken prior to the target 
course; s; € {0,1} is the student’s sensitive attribute such 
as gender or race; yi € {0,1} is the ground truth label indi- 
cating whether a student is at-risk (1) or not (0). We focus 
on a binary sensitive attribute, though our method can be 
easily extend to scenarios where the sensitive attribute is n- 
ary. We want to build a classifier to predict if a student will 
underperform in a future target course. The classifier needs 
to satisfy two constraints: 1) make predictions as accurate 
as possible and 2) the output of the classifier is individually 
fair as specified by Equation 2. 


The model is trained in a course-specific manner, namely, we 
train a model for each target course. Given a target course, 
we extract all the students who have taken it. The courses 
these students have taken prior to the target course are ex- 
tracted as prior courses. The students’ grades in the prior 
courses are extracted to form a matrix X and the students’ 
grades in the target course are Y. Students’ sensitive at- 
tributes are denoted as S. We train a course-specific model 
on (X,Y) to predict whether students who have not taken 
the target course will fail it or not. Note that sensitive at- 
tributes S' are not used as features. 


4.2 Proposed Algorithm 
In this section, we present the proposed model, multiple 
cooperative classifier model (MCCM). Figure 3 shows the 


433 Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020) 


ao ak WN 


architecture of the proposed model. The model is com- 
posed of two classifiers, each of which corresponds to a sen- 
sitive attribute, e.g., male or female. Given an individual 
((xi, 8), ys), the feature vector x; is fed into the two classi- 
fiers. The output of the classifier corresponding to s; is the 
individual’s probability of being positive, while the output 
of the classifier corresponding to 1 — s; is the individual’s 
probability of being positive if his/her sensitive attribute is 
changed. Based on the assumption of metric free individual 
fairness, to be fair the difference between the outputs of the 
two classifiers should be ignorable. In this work, the differ- 
ence is the KL-divergence of the two outputs. In addition 
to fairness, we also care about the accuracy of the classifier. 
Therefore, for student i, the objective function we seek to 
optimize is as following 


i= TMi log ps; ~ (1 a yi) log(1 —Ds;,i) a AKL (Ds; i; aa 

3 
where X is a hyperparameter trading off between accuracy 
and fairness, ps,,; is the probability of being positive pre- 
dicted by classifier s; and pi—s,,; is the probability predicted 
by classifier 1 — s;. Note that, for DL; only the classifier cor- 
responding to s; is updated. The classifiers are feed-forward 
neural networks with two hidden layers. The activation 
function is chosen to be ReLU [12]. Dropout [16] is used 
to prevent overfitting. 


Algorithm 1: Multiple Cooperative Classifier Model 


Input : Data D = {((xi, si), yi)}1, learning rate a, 
A, number of iterations T, classifier Co and 
C1. 


Initialize parameters {00,07} 

for t=1,...,7 do 
Sample example ((x;, si), yi) from D 
Feed x; into classifier Cs, and C1_s, 
Compute the loss L; according to equation 3 


t+1l _ gt OL; 
Os; ~ Os; an gE 
i 


return {66,67 } 


5. EXPERIMENTAL PROTOCOL 
5.1 Datasets 


To evaluate the proposed model, we collect ten-year data 
at George Mason University from Fall 2009 to Fall 2019. 
We choose top five majors including Biology (BIOL), Civil 
Engineering (CEIE), Computer Science (CS), Electrical En- 
gineering (ECE) and Psychology (PSYC). We only choose a 
course if there are at least 300 students who have taken it. 
We use a student’s grade in prior courses to predict whether 
a student is at-risk of failing a target course. While prepro- 
cessing the data, we exclude courses that are not relevant to 
a major such as elective courses. Table 1 shows statistics of 
the data. From the table, we can see clear gender difference 
for different majors. Female students tend to choose Biology 
and Psychology majors, while male students are more prone 
to engineering majors such as Civil Engineering, Computer 
Science and Electrical Engineering. Overall, the proportion 
of African-American students is relatively small, especially 
for Civil Engineering and Computer Science. 


We build course specific models, namely, for a target course 
we train a classifier to predict whether a student will fail 


that course in the future. We define as at-risk student if the 
student’s grade is lower than 3.0. Given a target course, the 
data related to that course is split into 75%, 15%, 15% for 
training, validation and testing, respectively. 


5.2 Baselines 

As in this work we focus on individual fairness, we com- 
pare our proposed model with several individually fair algo- 
rithms. 


5.2.1 Logistic Regression (LR) 

This baseline does not have a fairness constraint. It directly 
predicts if a student is at-risk or not. The input is a feature 
vector encoding a student’s grades in prior courses. The out- 
put is the student’s probability of failing the target course. 


5.2.2 Rawlsian Fairness (Rawlsian) 

The concept of Rawlsian fairness is that a worse candidate 
should never be favored over a better one. Joseph et al. [21] 
proposed an individually fair algorithm utilizing a contex- 
tual bandits as building block to implement Rawlsian fair- 
ness. 


5.2.3 Learning Fair Representation (LFR) 

The unfairness of a prediction comes from the correlation of 
the output with the sensitive attribute. Zemel et al. [32] 
proposed to remove the correlation by learning an interme- 
diate representation and train a classifier on it. 


5.2.4 Adversarial Learned Fair Representation (ALFR) 


Edwards et al. [9] propose to remove sensitive information 
from representation by adversarial learning. An encoder 
maps the original feature vector to a latent embedding vec- 
tor, from which an adversary tries to predict the sensitive 
attribute. While the adversary tries to predict the sensi- 
tive attribute, the encoder seeks to generate a representation 
that prevent the encoder from predicting it. 


5.3 Evaluation Metrics 

To evaluate if the proposed algorithm satisfy the accuracy 
and fairness constraints, we utilize three evaluation metrics 
accuracy, discrimination and consistency. 


The accuracy metric assesses the predictive accuracy of the 
model, defined as following 
Ban Lyi — Gi) 
= at 4 
acc N (4) 
where N is the number of examples, %; is the prediction and 
y is the ground truth label. 


Discrimination measures the difference between the groups’ 


rate of being predicted as positive, mathematically expressed 
as following 


YIN, (si = 0) * Gi 


ye 1(si = 1) * Hi ( 
Dizi L(si = 0) 


Da Hsi = 1) 


discri = | 


5) 


Consistency compares the predicted results of an individ- 
ual with his/her k-nearest neighbors. If the predicted results 
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Table 1: Dataset Statistics 


Major #8 #C #G #M #F #AA #NAA 
BIOL 6,127 16 124,716 1,927(31.45%) 4,200(68.55%) 759(12.39%)  5,368(87.61%) 
CEIE 450 7 23,708  338(75.11%)  112(24.89%)  27(6.00%)  423(94.00%) 
CS 2,430 11 90,819 1,942(79.92%)  488(20.08%)  157(6.46%)  2,273(93.54%) 
ECE 671 10 65,396 575(85.69%) 96(14.31%)  66(9.84%)  605(90.16%) 
PSYC 5,110 17 84,504 —1,200(23.48%) 3,910(76.52%) 694(13.58%)  4,416(86.42%) 


#5 total number of students, ##C number of courses for prediction, ##G total number of grades 
##M number of male students, #F number of female students, #AA number of African-American students 


#:NNA number of non-African-American students. 


is close to the results of the neighbors, consistency is high 
and the algorithm is fair. Consistency is defined as following 


OK Iie — Dyeramten fil 
ist n=1 JEKNN(a2;) FI 6 
consis y K (6) 


i=1 


where kKNN(z;) is the k-nearest neighbors of individual i. 


We use Gower similarity [13] to measure the similarity be- 
tween individuals. Gower similarity is defined as 
N 
Gower(i, 7) = Dikar WeSisk (7) 
kai We 
where N is the number of features and w, is the weight of 
the k-th variable, in this paper the weights are set to one; 
Sijx is the contribution by the k-th variable. If the k-th 
variable is continuous, 5;;x is defined as 


Sign = 1—- Sas (8) 
Tk 
where x;, is the value of k-th feature of 7 and rz is the 
range of values for the k-th variable. If the k-th variable is 
categorical, Sj;, is 1 if xi, = %j~ or 0, otherwise. 


6. EXPERIMENTAL RESULTS 
6.1 Results and Analysis 


We train a classifier for each course in a major to predict 
if a student will fail that course. The predictions are evalu- 
ated by using accuracy, discrimination and consistency. The 
results are averaged across the courses in a major. Table 2 
shows the experimental results with gender as sensitive at- 
tribute. From the table, we can see that the proposed model 
MCCM achieves the best performance in mitigating bias 
in terms of discrimination. It is able to achieve both group 
fairness and individual fairness, although, it is designed for 
achieving individual fairness. The reason is that group and 
individual fairness are highly correlated so that achieving 
one helps achieving the other. 


The predictions from LR model is highly biased as there is 
no fairness constraint imposed on it, but it performs well 
with respect to predicting accuracy. On average, the dis- 
crimination of LR is 7.3%. Other methods achieve fairness 
at the cost of accuracy. It is interesting to see that Rawl- 
sian is not able to remove bias and in some cases it leads 
to even more unfair predictions. Rawlsian is based on the 
idea that a worse candidate should never be favored over a 
better one, which is implemented by interval chaining that 
is a weak fairness constraint. We can also observe from the 


table that different majors have different level of bias, e.g., 
Psychology has the least bias while Computer Science has 
the highest bias with respect to the predictions of LR. The 
experimental results with race as sensitive attribute is shown 
in Table 3. The results are similar to those with gender as 
sensitive attribute. 


6.2 Fine-grained analysis of the bias 

To have a fine-grained view of the bias, we look at the data 
and predictions at the course level. In this section, we an- 
alyze the bias embedded in the data and predictions from 
LR and the proposed model MCCM. Figure 4 shows the 
fine-grained results with gender as sensitive attribute. For 
Figure 4, the data bias is that the proportion of at-risk fe- 
male students subtracts the proportion of at-risk male stu- 
dents. Positive bias means female students are more likely 
to be predicted as at-risk; otherwise male students are more 
likely to be predicted as at-risk. For the predictions from the 
models, the bias is the female students’ average probability 
of being predicted as at-risk students subtract that of male 
students. 


First of all, as stated in Section 1, the overall data such as 
overall GPA by gender shows that male is minority groups. 
However, when looking at the course level, different courses 
have different minority groups. Figure 4 shows that in some 
courses male students are less likely to be at-risk. This in- 
sights can be used to inform future fairness work in edu- 
cational data mining that a course specific model is desir- 
able, considering that different courses have different mi- 
nority groups. From the figures, we can also observe that 
data and machine learning models might have different bias 
direction. For example, in Figure 4(a), for course CO the 
data bias is against male while LR and MCCM is against 
female. In addition, data bias does not necessarily lead to 
predictive bias. For example in Figure 4, all the courses 
show data bias. However, a no-fairness-constraint classifier, 
e.g., logistic regression has fair predictions in many courses. 


7. CONCLUSION AND FUTURE WORK 


The concerns about bias and discrimination of machine learn- 
ing models are rising with the increasing of their adoption. 
In educational setting, we observe bias from a real-world 
dataset and machine learning models without fairness con- 
straints exhibit non-ignorable biased predictions. Machine 
learning models are intended to aid students with their learn- 
ing. However, unfair treatment of students can undermine 
their learning and graduation. To mitigate discrimination 
in educational data mining, in this paper, we proposed a 
fair machine learning model satisfying metric free individual 
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Table 2: Experimental results with gender as sensitive attribute. 


BIOL 


Method CEIE Cs _ ECE PSYC 
acc(t)|discri(|)|consist(t) acc(t)|discri(|)|consist(t) acc(t)|discri({)|consist(t) acc(t)|discri({)|consist(t)  ace(*+)|discri({)|consist (+) 
LR 0.7662|0.0613|0.8152 0.6761|0.0837|0.7451 0.6628|0.1007|0.7569 0.7545|0.0980|0.7655 0.7769|0.0192|0.9578 
Rawlsian 0.5889|0.0807|0.8120 0.6250|0.0866|0.7052 0.5582|0.0913|0.8301 0.6660|0.1498]0.7036 0.7559|0.0960|0.9396 
LFR 0.6470|0.0369|0.9691 0.6983|0.0518|0.9631 0.6004|0.0228)0.9463 0.7389|0.0273|0.9912 0.7898|0.0248|0.9865 
ALFR 0.6802|0.0202|0.9675 0.7062|0.0240|0.9855 0.6124|0.0134|0.9821 0.7465|0.0114|0.9783 0.7903|0.0125|0.9878 
MCCM 0.6774|0.0163|0.9401 0.6415|0.0165|0.9823 0.6180|0.0038|0.9562 0.7394|0.0061|0.9717 0.7868|0.0023|0.9958 


acc = accuracy, discri = discrimination, consist = consistency. 
+ means higher is better; | menas lower is better. 


Table 3: Experimental results with race as sensitive attribute. 


Method BIOL CEIE CS ECE PSYC 
acc(t)|discri(|)|consist(t) acc(t)|discri(|)|consist(t) acc(t)|discri({)|consist(t) acc(t)|discri({)|consist(t)  ace(*t)|discri({)|consist () 
LR 0.7662|0.1004|0.8152 0.6761|0.1411|0.7451 0.6628|0.1085|0.7569 0.7545|0.1238|0.7655 0.7769|0.0276|0.9578 
Rawlsian 0.5854|0.1129|0.7870 0.5849|0.3658|0.7349 0.5561|0.1857|0.8007 0.6999|0.1446|0.7416 0.7608|0.0776|0.9570 
LFR 0.6202|0.0569|0.9051 0.7099|0.1722|0.9701 0.6107|0.0599|0.9897 0.7441|0.0800|0.9852 0.7874|0.0172|0.9933 
ALFR 0.6850|0.0505|0.9504 0.7274|0.0862|0.9688 0.6129|0.0086|0.9715 0.7435|0.0384|0.9887 0.7898|0.0156|0.9882 
MCCM 0.6563|0.0198|0.9340 0.7138|0.0114|0.9828 0.5895|0.0303|0.9968 0.7133|0.0013|0.9986 0.7857|0.0021|0.9974 


acc = accuracy, discri = discrimination, consist = consistency. 
+ means higher is better; | menas lower is better. 
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Figure 4: Bias of different courses with gender as sensitive attribute. 


fairness. We evaluate the model’s performance on removing 
unfairness on datasets collected from an anonymous Univer- 
sity. The results show the efficacy of the model on removing 
bias. Compared to other domains, educational data min- 
ing has its own characteristics. For example, in our dataset, 
when looking at university level, male and African-American 
students are biased against. However, at course level, dif- 
ferent courses have different bias direction. This insights in- 
form that future work on fairness in educational data mining 
should design course-specific models. In this work, we treat 
gender and race separately in terms of removing bias. In the 
future, we want to build models that treat gender and race 
as sensitive attributes simultaneously. 
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